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The safety assessment of high-level radioactive waste repositories requires a high predictive accuracy for 
radionuclide diffusion and a comprehensive understanding of the diffusion mechanism. In this study, a through- 
diffusion method and six machine-learning methods were employed to investigate the diffusion of Re047, 
HCrO,4~, and I~ in saturated compacted bentonite under different salinities and compacted dry densities. The 
machine-learning models were trained using two datasets. One dataset contained six input features and 293 
instances obtained from the diffusion database system of the Japan Atomic Energy Agency (JAEA-DDB) and 
15 publications. The other dataset, comprising 15,000 pseudo-instances, was produced using a multi-porosity 
model and contained eight input features. The results indicate that the former dataset yielded a higher predictive 
accuracy than the latter. Light gradient-boosting exhibited a higher prediction accuracy (R? = 0.92) and lower 
error (MSE = 0.01) than the other machine-learning algorithms. In addition, Shapley Additive Explanations, 
Feature Importance, and Partial Dependence Plot analysis results indicate that the rock capacity factor and 
compacted dry density had the two most significant effects on predicting the effective diffusion coefficient, 
thereby offering valuable insights. 
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I. INTRODUCTION 


China is planning to build a deep geological repository for 
high-level radioactive waste in the Beishan area of Gansu 
Province [1]. Gaomiaozi (GMZ) bentonite from Inner Mon- 
golia was selected as an engineering barrier for the reposi- 
tory because of its high adsorption capacity, low permeability, 
good thermal conductivity, and abundant reserves [2-4]. It is 
a porous clay mineral with a layered structure consisting of 
tetrahedral-octahedral-tetrahedral sheets. Diffusion is the pri- 
mary transport process of radionuclides through the bentonite 
barrier [5]. Anionic radionuclides with long half-lives, such 
as 1I, 36CC17, SeO3?-, PHSe037, 9°TcO4~, and HTO, 
are widely recognized as significant contributors to potential 
long-term dose due to the high diffusivity caused by the an- 
ionic exclusion effect from the negatively charged bentonite 
surface [6, 7]. Therefore, evaluating the release of anionic ra- 
dionuclides from bentonite barriers is important for the safety 
assessment of repositories. 

Among diffusion parameters, the effective diffusion coeffi- 
cient is a critical parameter in safety assessment. It is affected 
by many influencing factors, including porosity, the species 
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diffusion coefficient in water, radionuclide concentration gra- 
dient, and tortuosity [8]. Numerous experiments have been 
conducted to identify certain influencing factors, including 
the compacted dry density, ionic strength, different types of 
bentonites, and temperature [9-12]. The relationship between 
these factors and radionuclide diffusion has been established. 
For example, the effective diffusion coefficient increases with 
a decreasing compacted dry density [13-18] and increasing 
ionic strength [10, 19-24]. Bentonites with a high montmo- 
rillonite content exhibit better radionuclide retardation owing 
to their low effective diffusion coefficient [7, 13, 20, 25]. Fur- 
thermore, the relationship between the effective diffusion co- 
efficient and temperature has been described using the Ar- 
rhenius equation [20, 26]. Several numerical models, includ- 
ing the multi-porosity model [27, 28], integrated sorption and 
diffusion models [19], and pore-scale models [9, 29], have 
been used to predict the effective diffusion coefficient and an- 
alyze the impact of these influencing factors. These models 
have generated theoretical results that align with experimen- 
tal results. However, few studies have reported quantitative 
metrics, such as the coefficient of determination (R?) or mean 
square error (MSE), to assess the models’ predictive accuracy. 


Machine-learning methods can perform regression analy- 
sis and interpret non-linear relationships and multi-factor sit- 
uations, making them valuable tools in engineering appli- 
cations [30, 31]. Numerous studies have used machine- 
learning methods, such as artificial neural networks (ANNs) 
and gradient-boosting models, to estimate the chloride dif- 
fusion coefficient in cement [32]. The predictive accuracy 
can be increased by incorporating physical information into 
the model [33]. These studies implemented techniques such 


as Individual Conditional Expectation (ICE), Shapley Ad- 
ditive Explanations (SHAP), and Partial Dependence Plots 
(PDPs) to analyze the weight of the influencing factors on 
chloride diffusion [34]. Regression analysis has been used 
to predict the chloride diffusion coefficient, with input fea- 
tures ranging from 4 to 23 and experimental instances rang- 
ing from 72 to 843 [32, 34-37]. Recently, Light Gradient- 
Boosting (LightGBM) and ANN algorithms were developed 
to predict the effective diffusion coefficient of Re(VII) us- 
ing pseudo-instances produced from a multi-porosity model. 
The ANN algorithm achieved an R* of 0.97, whereas Light- 
GBM achieved an R? of 0.92 [27]. However, few studies 
have explained the correlation between the influencing factors 
and the effective diffusion coefficient of radionuclides using 
machine-learning models. 

In this study, machine-learning models were employed to 
investigate the diffusion of several simulated radionuclide 
anions (ReO4~ as an analogue for 9TcO47, HCrO47 as 
an analogue for some redox sensitive mono-valent radionu- 
clide anions, and I~ as an analogue for !?°I~) in compacted 
bentonite. The effective diffusion coefficient prediction ac- 
curacy was evaluated based on two training datasets: one 
was collected from the diffusion database system of the 
Japan Atomic Energy Agency (JAEA-DDB) and 15 pub- 
lications; the other contained pseudo-instances produced 
using the multi-porosity model. The main goals of this 
study can be summarized as follows: (i) Improve the dif- 
fusion database by measuring the effective diffusion coeffi- 
cient of ReO4~, HCrO.4~, and I~ in compacted bentonite; 
(ii) Select machine-learning algorithms with high predictive 
performance among six models, including LightGBM, Ex- 
treme Gradient-Boosting (XGBoost), Categorical Gradient- 
Boosting (Catboost), ANN, Random Forest (RF), and Sup- 
port Vector Machine (SVM); (iii) Determine whether the 
machine-learning models have a sufficient understanding of 
the diffusion mechanism by quantitative analyzing the influ- 
encing factors on diffusion. The main novelty of this study 
lies in the development of a machine-learning model with 
high predictive accuracy and the interpretation of correlations 
between the influencing factors and the effective diffusion co- 
efficient of radionuclides. 


II. MATERIALS AND METHODS 
A. Materials 


GMZ and Anji bentonite powders were obtained from Gao- 
miaozi, Inner Mongolia, and Anji, Zhejiang Province, re- 
spectively. The GMZ bentonite has a grain density of 2660 
kg/m’, particle size (d50) of 7.1 um, cation exchange capac- 
ity of 77.3 meq/100 g, and external surface area of 25.6 m?/g. 
The mineral composition is 74.5% montmorillonite, 12 wt% 
quartz, 7 wt% cristobalite, 4 wt% feldspar, 1 wt% calcite, and 
1 wt% kaolinite [38]. In contrast, the Anji bentonite has a par- 
ticle size (d50) of 11.6 um, cation exchange capacity of 76 
meq/100 g, and external surface area of 60.3 m?/g. The min- 
eral composition is 46 wt% montmorillonite, 33 wt% quartz, 


10 wt% orthoclase, 8 wt% microcline, and 3 wt% calcite [27]. 

Stock solutions of ReO4~, HCrO.4~, and I~ were pre- 
pared by weighing certain amounts of KReOy, K2Cr207, 
and Nal, and then dissolving them in 200 mL of NaCl so- 
lution. The initial concentrations of ReOQ4~, HCrO4~, and 
I- were 1.12 x 107? mol/L, (0.26 — 2.14) x 107? mol/L, and 
0.04 x 107? mol/L, respectively. An Optima 7000DV induc- 
tively coupled plasma optical emission spectrometer (ICP- 
OES,PerkinElmer, USA) was used to measure the concen- 
trations. All reagents used in this study were of analytical 
grade. 


B. Diffusion method 


A through-diffusion method, which measures the diffusion 
parameters of ions through a specific thickness of porous ma- 
terials, was applied to investigate the anion (ReO4~ , HCrO4~, 
and I~) diffusion in compacted bentonite. The experiments 
were performed using 0.10-0.50 mol/L NaCl solution, with 
the compacted dry density ranging from 1300 to 1700 kg/m?, 
pH of 5.6 +0.1, and a temperature of 15 +3 °C. 

The bentonite powder was compacted into blocks (2.54 x 
1.2 cm). Two stainless-steel filters (©2.54 x 0.1 cm) were 
used to sandwich the blocks. Then, the entire assembly was 
inserted into a cylindrical cell. After the bentonite blocks 
were saturated with 0.10-0.50 mol/L NaCl solution for one 
month, a reservoir connected to one side of the diffusion cells 
(x = 0) was replaced with 200 mL of the prepared stock so- 
lution containing ReO4~, HCrO.4~, and I~. The other side 
of the diffusion cell (x = L) was connected to a target reser- 
voir filled with 10 mL of NaCl solution. The target reservoir 
was replaced at given intervals to maintain a low anion con- 
centration gradient, ensuring that it remained at less than 5% 
of the concentration at x = 0. A detailed description of the 
equipment and experimental procedure can be found in the 
literature [12]. 

The self-programmed Fitting for Diffusion Parameters 
software was used to calculate the rock capacity factor and 
effective diffusion coefficient by analyzing the accumulated 
mass as a function of time. The reliability of the two parame- 
ters was evaluated by examining the consistency between the 
calculated and experimental flux results. 


C. Multi-porosity model 


A multi-porosity model was established for the microstruc- 
ture of montmorillonite because montmorillonite is the pre- 
dominant mineral in bentonite. This model considers only the 
through-pores of compacted bentonite, where the total poros- 
ity (Etot) is subdivided into three components: diffuse double- 
layer porosity (€gq)), interlayer porosity (€i), and free-layer 
porosity (Efree) [27, 39]. When compacted bentonite is satu- 
rated with an aqueous solution, the diffuse double-layer pores 
form transition zones from the surface of the bentonite parti- 
cles to free pore water, containing a deficit of anions, water 


molecules, and an excess of cations. The interlayer pores con- 
tain cations and water molecules. Excess cations compensate 
for the charge deficit of the tetrahedral-octahedral-tetrahedral 
layers. By contrast, water molecules are arranged in lay- 
ers [7]. Free-layer pores are spaces that comprise charge- 
balanced anions, cations, and water molecules. 

Owing to the anionic exclusion effect, anionic radionu- 
clides can barely enter the interlayer pores of bentonite. 
Therefore, the model assumes that the free-layer pores are 
the predominant diffusion paths, and the accessible porosity 
Eacc is defined as 


Eace © Efree = Etot — Eddi — Eil- (1) 


The diffuse double-layer porosity €4q;, which depends on the 
ionic strength, external surface area, mass ratio of montmo- 
rillonite, and compacted dry density, can be estimated as 


3.09 x 107! 
Edd) = at 


The interlayer porosity depends on the compacted dry den- 
sity, water layer fraction, and the mass ratio of montmoril- 
lonite. The interlayer water is related to the degree of com- 
paction, namely, one water layer ranged between 12.2-12.7 A 
at a compacted dry density of 1600-2000 kg/m?, two wa- 
ter layers ranged between 15.2-15.7A at a compacted dry 
density of 1300-1600 kg/m, and three water layers ranged 
between 18.4-19A at compacted dry densities below 1300 
kg/m? [40]. The interlayer porosity £i is approximately given 
by [39]: 
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where the amount of interlayer water w; is given by 

* 0.119 kg H2O/kg clay for one water layer at 1600 kg/m? 
< pa < 2000 kg/m’, 

e 0.238 kg H,O/kg clay for two water layers at 1300 
kg/m? < pa < 1600 kg/m’, 

e 0.357 kg H2O/kg clay for three water layers at pa < 
1300 kg/m. 

The layer faction, x;, is approximately calculated as fol- 
lows, where the subscript i denotes one, two, or three water 
layers [27, 39]. 

e At 1300kg/m? < pa < 1600kg/m’, 


Pa — Pd3WL>2WL 
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The effective diffusion coefficient is estimated by combin- 
ing Eq. (1) with Archie’s equation, as follows: 


De = Dy ` Eac” = Dy: (Etot — Eaat — Ei)”. (7) 


where Dw denotes the species diffusion coefficient in water. 
Re047 is 1.46 x 107° m?/s, HCrO47 is 1.13 x 107° m?/s, 
and I~ is 2.0 x 107° m?/s [41]. n(—) denotes the cementation 
factor. 


D. Database description and analysis 


The training dataset was obtained from two sources. One is 
a dataset containing experimental instances from the JAEA- 
DDB (223 instances, 1989-2005) [42] and 15 publications 
(99 instances, 2006-2024), which are listed in Table S1 in 
the Supporting Information. The other database contained 
pseudo-instances produced using the multi-porosity model 
(15,000 instances) [27]. Table 1 summarizes the statistical 
information for the datasets. For the dataset collected from 
JAEA-DDB and the literature, only instances of anion diffu- 
sion in bentonite were chosen. 

Data pre-processing was performed using the Mahalanobis 
distance (MD) to remove outliers. MD is a distance measure 
used extensively in multivariate spaces. This accounts for the 
mean and covariance of the data. The cutoff point (d;) is de- 
fined as [37] 


dj = \/(Ki—X)-C1-(%;-¥), (8) 


where C, X;, and X are the covariance matrix of the sam- 
ple, the object vector, and the arithmetic mean vector, respec- 
tively. In this study, the cutoff point was set to five, resulting 
in 29 instances as outliers. Therefore, the dataset contained 
197 instances from JAEA-DDB and 96 instances from publi- 
cations on machine-learning models. 

The input features were the rock capacity factor, com- 
pacted dry density, mass ratio of montmorillonite, species dif- 
fusion coefficient in water, ionic strength, and temperature. 
The input features for the multi-porosity model dataset were 
the external surface area, mass ratio of montmorillonite, ionic 
strength, accessible porosity, compacted dry density, cemen- 
tation factor, fitting parameter, and species diffusion coeffi- 
cient in water. Among these features, the rock capacity fac- 
tor indicates the ability of the bentonite barrier to impede ra- 
dionuclide diffusion into the granite rock. If the rock capacity 
factor is less than the total porosity, it is equal to the acces- 
sible porosity. The external surface area, accessible porosity, 
and cementation factor indicate the bentonite characteristics, 
while the species diffusion coefficient in water indicates the 
radionuclide properties. The remaining features, such as the 
temperature, ionic strength, and compacted dry density, are 
parameters related to the experimental conditions. The effec- 
tive diffusion coefficient is the only output feature. 

The test dataset consisted of eight instances obtained from 
the diffusion of ReO4~, HCrO4~, and I” using the through- 
diffusion method. Given that both the effective diffusion co- 
efficient and the species diffusion coefficient in water were in 


Table 1. Statistical information for the training dataset for machine-learning models 


Data source Parameters Mean Min Max Std Skw 

Rock capacity factor, œ 1.45 0.01 19.08 2.79 4.33 
Compacted dry density, pg (kg/m? ) 1303.89 400 2000 326.16 —0.38 
Inp Species diffusion coefficient in water, logDy -—8.74 —9.30 —8.24 0.14 —0.54 

JAEA-DDB/Publications Temperature, T (°C) 29.38 12.00 90.00 15.36 2.18 
Mass ratio of montmorillonite, m 0.78 0.33 1.00 0.18 —1.08 


Ionic strength, J (mol/L) 


0.25 0.01 1.03 0.21 1.02 


Output Effective diffusion coefficient, log De —10.25 —12.60 —9.17 0.72 —1.02 
External surface area, Aext ( m/g ) 69.20 10.00 129.98 34.5 0.03 
Mass ratio of montmorillonite, m 0.65 0.30 1.00 0.20 0.00 
Ionic strength, J (mol/L) 0.78 0.05 1.50 0.42 —0.02 
Input Accessible porosity, Eacc 0.22 0.00 0.53 0.11 0.32 
Multi-porosity model Compacted dry density, pg (kg/m? ) 1497 1000 2000 287 0.01 
Cementation factor, n 2.71 2.00 3.40 0.40 —0.03 
Fitting parameter, If 0.85 0.70 1.00 0.09 0.01 
Species diffusion coefficient in water, logDw —8.54 —9.09 —8.24 0.22 —0.66 
Output Effective diffusion coefficient, log De —10.51 —19.73 —8.88 0.88 —1.93 


Std = Standard Deviation; Skw = Skewness 


the range of 10 7! to 10 ~° m?/s, a logarithmic conversion 
was applied to maintain consistency with the range of other 
features, which spanned values from 0 to 2000. This data 
pre-processing improves the performance [37]. 


E. Performance evaluation of the machine-learning model 


The predictive accuracy was evaluated using R? and MSE. 
These parameters were respectively calculated as follows: 


N ex red 2 
(1ogD$ i —logDe; ) 
2 i=1 i i 
as N exp exp 2? a 
a (log Dsi gg loge) 
1 a exp pred 2 
MSE =~ = (log D2? —log DP") , (10) 
pred 


where N is the number of instances. log De? and log Dg; 
represent the experimental and predicted output values, re- 

$ exp P P 
spectively. log De ave denotes the average of experimental in- 
stances. Increased predictive accuracy is associated with an 
increase in R? and a decrease in MSE. 


Five-fold cross-validation (CV) was employed to mitigate 
overfitting, a situation characterized by high predictive per- 
formance in the training or validation datasets, but low accu- 
racy in the test dataset, resulting in poor generalization and 
reduced robustness of the machine-learning model. In this 
approach, the dataset was randomly divided into five equally 
sized subsamples, with four subsamples used for training and 
one used for testing. 


HI. RESULTS AND DISCUSSION 
A. Database distribution and characteristics 


Figures 1A—1F show the dependence of the effective 
diffusion coefficient on each input feature for the JAEA- 
DDB/publications dataset. The dependence of the multi- 
porosity model can be found in a previous study [27]. The 
histograms and kernel curves displayed on the top and right 
sides of each plot correspond to the distribution of the input 
features and effective diffusion coefficient. The shape of the 
curves is determined by the data point concentration; a high 
data point concentration results in a higher peak amplitude. 

The rock capacity factor can be obtained directly using the 
through-diffusion method, or calculated as follows [27, 39]: 


(1) 


where € (—), Pa (kg/m?), and Ka (m3/kg) denote the poros- 
ity, compacted dry density, and distribution coefficient, re- 
spectively. Specifically, the total porosity of compacted ben- 
tonite is equivalent to that of neutral molecules, such as HTO 
[10, 11, 17]. In contrast, the accessible porosity was assumed 
to be the porosity of anionic radionuclides such as *°CI~ and 
a [10-12]. This assumption is based on the ionic exclu- 
sion effect in which radionuclides are hindered from access- 
ing the negatively charged bentonite surface [17, 43]. The 
rock capacity factor ranged from 0.01 to 19.08 (Table 1). 
Most data points are concentrated below two. Specifically, 
13.3% of the data points exceeded two, 23.2% ranged from 
unity to two, and 63.5% were less than unity. Notably, only 
nine data points were higher than ten (Fig.1 A). Oscarson et 
al. [44] reported that the rock capacity factors of °?TcO47 
and !*>J~ were greater than five, accounting for 4.9% of the 
high values. This abnormal observation may be attributed to 
the calculation method using Eq. (10). The distribution coef- 
ficient may have been overestimated, because it was measured 
through sorption experiments with powdered bentonite. 
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Fig. 1. Distribution and characteristics of the input features and output variable. 


The effective diffusion coefficient increased with a de- 
crease in the compacted dry density (Fig. 1B), which is con- 
sistent with previous experimental results [13-18]. It was 
not surprising that the effective diffusion coefficient increased 
with increasing species diffusion coefficient in water and tem- 
perature (Figs. 1C and 1D). This observed behavior can be at- 
tributed to adherence to two diffusion laws: one is known as 
Archie’s law, which can be expressed as De = Dy, - €” [39, 43]. 
The second is represented by the Arrhenius equation given by 
De =A „e Fa/RT [45]. Figures 1E and 1F show that the data 
point distribution is smeared out for concentrated data within 
a limited range. This can be explained by the fact that there is 
no strict one-to-one dependency of the effective diffusion co- 
efficient on the ionic mass ratio of montmorillonite and ionic 
strength. 


B. Measurement of diffusion parameters using the 
through-diffusion method 


The through-diffusion method was used to determine the 
diffusion parameters of ReO4~, HCrO,4~, and I” in com- 
pacted bentonite. Fig. 2 shows the breakthrough curves un- 
der various salinity and compaction conditions. The impact 
of salinity on diffusion in GMZ bentonite is shown in Figs. 
2A—2E, while the effect of compacted dry density in Anji 
bentonite is presented in Figs. 2F—2H. The red dots and lines 
represent the flux results, while the blue dots and lines repre- 
sent the accumulated mass results. The solid dots represent 
the experimental data, the lines represent the calculated re- 
sults for the relationship between the accumulated mass or 


flux over time, and the shaded area indicates the calculated 
upper and lower limits, which consider the uncertainties of 
the rock capacity factor and the effective diffusion coeffi- 
cient. These uncertainties are associated with various factors, 
such as the sample weight, volume of bentonite block and 
stainless-steel filters, dead volume of the diffusion cells, and 
ICP-OES measurements. The ionic strength Z (mol/L) was 
calculated as follows: 


1 n 2 
f=_) Ge. 12 
2d Zz (12) 


where C; is the total concentration of each species i in a solu- 
tion, including Nat, Kt, C17, ReO4~, HCrO4~, and I. zi 
is the charge number of species i. 

Table 2 lists the diffusion parameters of ReO4~ , HCrO47, 
and I~ in compacted bentonite. Both the rock capacity fac- 
tor and effective diffusion coefficient in compacted bentonite 
increase as the ionic strength increase and the compacted dry 
density decrease. These trends are consistent with those re- 
ported in previous experimental studies [9, 16-19, 24]. How- 
ever, when comparing these results with those of previous 
studies on GMZ bentonite [24], it was observed that ReO47 
had a higher effective diffusion coefficient, which can be ex- 
plained by the fact that a lower compacted dry density was 
investigated in this study. In comparison with previous stud- 
ies [9, 18], higher effective diffusion coefficient values for 
HCrO4~ and I~ were observed. This could also be explained 
by the higher ionic strength and lower mass ratio of montmo- 
rillonite in the HCrO,~ diffusion experiments and the higher 
ionic strength and compacted dry density in the I~ diffusion 
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Fig. 2. Flux, J(L,t), and accumulated mass, Acum, as a function of time. pH = 5.6 + 0.1; Co(Re) = 1.12 x 10-3 mol/L; Co(Cr) = (0.26 — 2.14) 


x 1073 mol/L; C(I) = 0.04 x 1073 mol/L; T= 15 + 3°C. 


experiments. The experimental results comply with the diffu- 
sion rules for anions and fall within the training dataset range. 
Therefore, the test dataset was deemed suitable for evaluation 
purposes. It is noteworthy that the rock capacity factors of 
the measured ReO4~, HCrO,4~, and I~ are less than the to- 
tal porosity, indicating that they cannot be adsorbed onto the 
bentonite surface. The rock capacity factors are equivalent to 
the accessible porosity. 


C. Prediction by the machine-learning algorithms 


Six machine-learning algorithms, namely LightGBM, 
XGBoost, Catboost, ANN, RF, and SVM, were employed 
to predict the effective diffusion coefficient using two train- 
ing datasets. One dataset comprised eight input features 
and 15,000 pseudo-instances produced by the multi-porosity 
model. The other dataset included six input features and 293 
instances sourced from JAEA-DDB and 15 publications (Ta- 
ble 1). The datasets were divided into training and validation 
sets at a ratio of 4:1. The test dataset for the machine-learning 
models consisted of the experimental results listed in Table 
2. LightGBM, XGBoost, Catboost, and RF are ensemble- 
learning algorithms, while ANN and SVM are traditional 
learning algorithms. Table 3 lists the mean values of the two 
performance metrics for the test datasets of the six machine- 
learning models using the five-fold cross-validation tech- 
nique. LightGBM outperformed the other machine-learning 
models in terms of predictive performance, achieving the 
highest Re of 0.87 and the lowest MSEcy of 0.01. 

Hyperparameters, which are an integral part of machine- 
learning models, cannot be learned from the dataset. They 
were set prior to model training to control the models’ learn- 
ing process. The grid search (GS) method was used to tune 


the hyperparameters. Reasonable settings for each hyper- 
parameter were manually predefined. The model was iter- 
ated through each combination of the specified values. For 
the training datasets, the cross-validation method was used 
for guidance. After evaluating all combinations, the param- 
eter combination with the best model performance was ob- 
tained. Table 4 summarizes the tuned hyperparameters for 
each machine-learning model. 


A comparison between the experimental and predicted ef- 
fective diffusion coefficients is presented in Fig. 3, where the 
dots indicate the experimental data, the red lines represent the 
linear fit of the experimental data, and the shaded areas rep- 
resent the 95% confidence interval. For the multi-porosity 
model, the predictive accuracy is ranked in descending order 
as SVM > XGBoost > RF > LightGBM > CatBoost > ANN 
(Figs. 3A—3F). The SVM outperformed the other machine- 
learning models in terms of predictive performance, with an 
MSE of 0.01 and R? of 0.83. By contrast, when using the 
JAEA-DDB/publications dataset, the predictive accuracy is 
ranked in descending order as LightGBM > CatBoost > XG- 
Boost > SVM > RF > ANN (Figs. 3G—3L). All gradient- 
boosting algorithms exhibited high performance, with R? val- 
ues above 0.88. LightGBM and XGBoost achieved simi- 
lar predictive accuracies, with an MSE of 0.01 and R? of 
0.91. The JAEA-DDB/publications outperformed the multi- 
porosity model. This can be attributed to the complexity of 
the predictive tasks that involve predicting multiple species 
(ReO4~, HCrO4~, and I~) under different salinity and com- 
paction conditions. This complexity poses a significant chal- 
lenge for effectively training machine-learning models using 
the dataset generated from the multi-porosity model, as the 
predictive accuracy is notably influenced by the quality of 
the model. In general, boosting models that combine weak 
learners using weight-based aggregation exhibit stronger pre- 


Table 2. Overview of the diffusion parameters for anions in compacted bentonite. 


Clay Anion I (mol/L) Pa (kg/m?) Co (x 107 mol/L) De (X 10-1! m/s) a (—) Etot (—) 
ReO,~ 0.12 1300 1.12+0.05 7.10.7 0.32 +0.04 0.51 
Re04~ 0.32 1300 1.12+ 0.05 8.10.7 0.40 +0.06 0.51 
GMZ HCr0O4 0.12 1300 2.14+0.07 5.6£0.7 0.46 + 0.04 0.51 
HCrO4 0.32 1300 2.14+0.07 6.4+0.6 0.50+0.04 0.51 
r 0.42 1300 0.04 0.01 9.1 +0.7 0.30 0.06 0.51 
HCr0O4 0.50 1300 0.26+0.01 7140.4 0.42 +0.04 0.54 
Anji HCr0O4 0.50 1500 0.27 +0.01 3.80.2 0.35 + 0.03 0.46 
HCrO4 0.50 1700 0.26+0.01 1.2+0.2 0.22 + 0.02 0.39 


Table 3. Mean values of different performance metrics using the 
five-fold cross-validation technique for the test dataset. 


Algorithm JAEA-DDB/Publications Multi-porosity model 
Roy MSEcy Roy MSEcy 

LightGBM 0.87 0.01 0.74 0.02 
CatBoost 0.85 0.01 0.73 0.02 
XGBoost 0.73 0.02 0.75 0.02 
SVM 0.72 0.02 0.78 0.02 
RF 0.79 0.02 0.61 0.03 
ANN 0.72 0.02 0.50 0.04 


diction capabilities. This finding is consistent with the results 
of previous studies [27, 46]. Notably, LightGBM achieves a 
higher predictive accuracy among boosting models because 
it utilizes two innovative techniques: gradient-based one-side 
sampling and exclusive feature bundling [27, 47]. 


D. Shapley Additive Explanation and Feature Importance 
analyses 


The Shapley Additive Explanation (SHAP) and Feature 
Importance (FI) methods are two widely used feature attri- 
bution methods that can identify the weight or significance of 
input features driving the predictions [34]. Although SHAP 
and FI analyses employ distinct techniques to characterize 
their importance, they can reflect the influence on the pre- 
dicted output by ranking the importance of the input features 
[48]. In this study, they were applied to the LightGBM model 
using the JAEA-DDB/publications dataset, which yielded the 
highest predictive accuracy among the six machine-learning 
models. Higher SHAP and FI values for a feature indicate a 
greater impact on the effective diffusion coefficient. As can 
be seen in Fig. 4, the rock capacity factor and the compacted 
dry density are the top-two important input features for effec- 
tive diffusion coefficient prediction. For the remaining four 
features, the FI analysis is ranked in descending order as fol- 
lows: T > logDy > I > m, while the SHAP analysis ranked 
them as: logDwœ T > m > I. The difference in the montmo- 
rillonite mass ratio ranking between the two analyses can be 
attributed to the underlying principles and assumptions of the 
two analysis technologies. 

Ionic strength is closely associated with the electrical dou- 
ble layer located at the bentonite interface [9]. Although ionic 
strength had a limited effect on the effective diffusion coeffi- 


Table 4. Hyperparameters and other parameters for machine learn- 
ing models. 


Algorithm Parameter Values 
Multi-porosity JAEA-DDB 
model Publications 
Num_boost_round 10000 10000 
Max_depth 2 1 
: Learning_rate 0.001 0.05 
LightGBM Num_leaves 30 30 
Min_data_in_leaf 21 14 
Feature_fraction 0.5 0.45 
Boosting gbdt gbdt 
Bagging freq 30 4 
Bagging seed 25 1 
Bagging fraction 0.5 0.5 
Lambda_l1 9 0.01 
Lambda_12 0 0.08 
Iterations 2000 200 
Depth 11 7 
CatBoost Learning_rate 0.01 0.48 
Subsample 0.70 0.81 
Metric_period 500 100 
L2_leaf_reg 39 0.97 
Rsm 0.4 0.4 
Random_seed 87 43 
Num_boost_round 1500 1000 
Max_depth 3 10 
Eta 0.1 0.04 
XGBoost Gamma 2 0.01 
Lambda 1 0.33 
Subsample 0.17 0.72 
Min_child_weight T 12 
Reg_alpha 3 0.1 
Booster gbtree gbtree 
Colsample_bytree 0.8 0.2 
Cache_size 100 1 
Gamma 0.001 0.01 
SYM Kernel Rbf Rbf 
C 0.05 31 
Epsilon 0.01 0.44 
N_estimators 3 21 
Max_depth 4 1 
RF 
Max_features auto auto 
Min_samples_split 2 2 
Min_samples_leaf 4 0.15 
Min_weight_fraction_leaf 0.04 0.05 
Random_state 85 4 
Epochs 10000 10000 
Learning_rate 0.005 0.005 
Hidden layers 3 3 
ANN Number of neurons 64 100 
Activation function PReLU PReLU 
Dropout 0.2 0.2 
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Fig. 3. Comparison between the experimental and predicted effective diffusion coefficient results based on (A—F) the multi-porosity (MP) 
model dataset and (G—L) the diffusion database system from the Japan Atomic Energy Agency (JAEA-DDB) and 15 publicationsusing the 
(A, G) Light Gradient-Boosting, (B, H) Extreme Gradient-Boosting, (C, I) Categorical Gradient-Boosting, (D, J) Artificial Neural Network, 


(E, K) Random Forest, and (F, L) Support Vector Machine models. 
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Fig. 4. Feature Importance and absolute mean Shapley Additive Ex- 
planations values for each feature using the Light Gradient-Boosting 
model. 


cient prediction (Fig. 4), its influence on radionuclide dif- 
fusion has been investigated in previous experimental diffu- 
sion studies [9, 19, 24]. The effective diffusion coefficient in- 
creases in solutions with high salinity until the ionic strength 
exceeds 0.5 mol/L. This observation is explained by the mini- 
mum thickness of the electrical double layer, which results in 
negligible diffuse double-layer pores and a maximum width 
of free layer pores [9, 19, 24]. In addition, there is an on- 
going debate on the effect of the electrical double layer on 
radionuclide diffusion [9, 49]. This can be explained by the 
small porosity proportion in the diffused double-layer pores 
[28]. It is worth noting that the weight of a feature relies on 
input features, instances, and algorithms. Further research is 
needed to clarify the importance of ionic strength in radionu- 
clide diffusion. 


E. Partial Dependence Plot analysis 


Partial Dependence Plot (PDP) analysis indicates the abil- 
ity to analyze the relationship between each input and out- 
put features [34]. These plots provide a quantitative assess- 
ment of the positive and negative effects of the six input fea- 
tures on the effective diffusion coefficient (Fig. 5). A feature 
with a strong impact on the output variable exhibits signifi- 
cant changes in the PDP curves, indicating a significant con- 
tribution to the model’s prediction. By contrast, a feature with 
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Fig. 5. Partial Dependence Plot analysis of the effect of input features on the effective diffusion coefficient. The blue lines represent the 
partial dependence value, while the gray columns represent the data point distribution for each input feature at a certain value. 


little impact results in flat or nearly constant PDP curves. 


The rock capacity factor, species diffusion coefficient in 
water, ionic strength, and temperature positively impacted 
the effective diffusion coefficient, whereas the compacted dry 
density and montmorillonite mass ratio negatively impacted 
it. In other words, the effective diffusion coefficient increases 
with increasing species diffusion coefficient in water, temper- 
ature, and ionic strength, which is consistent with Archie’s 
law [39, 43], the Arrhenius equation [45], and previous exper- 
imental results [10, 19-24]. Conversely, the effective diffu- 
sion coefficient decreases in compacted bentonite with a high 
compacted dry density and high montmorillonite mass ratio, 
which is also consistent with previous experimental results 
[9, 13-15]. 

Among the input features, the rock capacity factor had the 
most significant influence on the effective diffusion coeffi- 
cient, which is in agreement with the SHAP and FI analy- 
ses. An increase from 0.01 to 19.08 in the rock capacity fac- 
tor resulted in a significant increase in the PDP value from 
—11.16 to —9.90, representing a substantial increase of ap- 
proximately 11.3% (Fig. 5A). It is worth noting that the rock 
capacity factor of radionuclide anions should be lower than 
the total porosity if the anionic exclusion effect is consid- 
ered [39, 43], indicating that some anionic instances with a 
rock capacity factor above the total porosity threshold should 
be removed from JAEA-DDB. Nonetheless, these instances 
were retained in this study for database integrity. The per- 
centage increase in the PDP value is ranked in descending 
order as follows: œ (11.3%) > T (8.8%) > logDy (6.5%) >I 
(1.3%). 

The compacted dry density had a negative impact, as an in- 
crease from 400 to 1700 kg/m? led to a decrease in the PDP 


value from —9.85 to —10.74, corresponding to a decrease 
of approximately 9.0% (Fig. 5B). This finding is consistent 
with the results of previous studies [13-18]. Additionally, 
the montmorillonite mass ratio had a negative impact; an in- 
crease from 0.33 to 1.0 led to a decrease in the PDP value 
from —10.02 to —10.31, corresponding to a decrease of ap- 
proximately 2.8% (Fig. 5E). This indicates that bentonite has 
a low montmorillonite mass ratio, such as the illite/smectite 
mixed-layer (I/S) (m = 0.33) and Kunigel V1 (m= 0.46—0.49) 
bentonites, and exhibits a higher effective diffusion coeffi- 
cient, which is in agreement with the findings of previous 
studies [7, 9, 13, 16, 25]. Generally, bentonite barriers with 
a higher montmorillonite mass ratio exhibit better blocking 
abilities against radionuclides [5]. As can be seen in Figs. 
5D and 5F, the predicted effective diffusion coefficient in- 
creases with increasing ionic strength and temperature. The 
effect becomes significant when the ionic strength and tem- 
perature range from 0.01 to 0.6 mol/L and from 22 to 60 °C, 
respectively, which is consistent with the findings of previous 
studies [20, 26]. This indicates that the PDP analysis provides 
interpretability of the diffusion law and mechanism. 


IV. CONCLUSION 


The effective diffusion coefficients of ReO4~, HCrO4~, 
and I~ in compacted Gaomiaozi and Anji bentonites under 
various ionic strength and compacted dry density conditions 
were investigated using a through-diffusion method and six 
machine-learning models. Based on the results, the main find- 
ings of this study can be summarized as follows: 

(i) The training and validation datasets were obtained from 


two sources: experimental instances and pseudo-instances. 
The former outperformed the latter. 

(ii) The Light Gradient-Boosting algorithm demonstrated 
a higher predictive accuracy than others machine-learning al- 
gorithms, achieving an MSE of 0.01 and RÊ of 0.92, for the 
dataset obtained from the JAEA-DDB and 15 publications. 

(iii) Analyses of the input features of the prediction using 
the Shapley Additive Explanation, Feature Importance, and 
Partial Dependence Plot methods revealed that the rock ca- 
pacity factor and compacted dry density were the two most 
important features. The rock capacity factor had a positive 
influence, whereas the compacted dry density had a negative 
impact. 

In this paper, a novel machine-learning model for radionu- 
clide diffusion prediction with high accuracy is introduced 
and the diffusion mechanism is explored by ranking the influ- 
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encing factors and analyzing the dependency of the effective 
diffusion coefficient on each influencing factor. This suggests 
that machine-learning algorithms can be powerful tools, of- 
fering a new paradigm for studying the diffusion of radioac- 
tive anions in bentonite barriers. Further research is neces- 
sary to evaluate the applicability of this method for improving 
machine-learning models by incorporating additional charac- 
teristic parameters of bentonite, complex chemical species, 
and a broader range of geochemical conditions related to 
high-level radioactive waste repositories. 
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